Behind the Beats: A Feature Analysis of Songs by Spotify’s Fab Five
Authors
Brian Kwon
Jorge Bris Moreno
Aaron Schwall
Joe Zhang
Abstract
This project looks at specific audio features among the top five most streamed spotify artists of 2022. We used the spotify API to download information on seven audio features for each of the five most streamed artists: BTS, Bad Bunny, The Weeknd, Taylor Swift, and Drake. The features studied are: Accousticness, Danceability, Energy, Loudness, Speechiness, Tempo, and Valence. We performed exploratory data analysis to determine our hypothesis, and then performed three different hypothesis tests to determine which features were common among the artists. We performed T-Tests, Wilcoxon Ranked Sum tests, and bootstrapping tests. We found that the tempo was the same among all of the top five artists, but all of the other audio features were different for at least one artist.
Introduction
This project aims to compare various features of the top 5 most streamed artists on Spotify in 2022 (Spangler, n.d.). The goal is to identify which variables are statistically same or different among those artists, and eventually determine the optimal range of features in order to get more streams. While we cannot determine with these tests whether or not the features are deterministic on popularity, it will be of great use for comparison and analysis. Thus, if a feature does not differ on any of the artists, we can safely imply that there seems to be an optimal range on that feature if the feature itself is actually deterministic on the number of streams.
For this study, all features obtained from spotify API will be used except for instrumentalness and Liveness. Given that instrumentalness has a huge amount of zeros due to the nature of these artists, that is they are singers, this feature is out of the scope of this project. Furthermore, liveness focuses on whether the music was performed live or not. This feature also is not related to the number of streams.
The rest of the features and their meanings are as follows (Spotify, n.d.):
Acousticness: Measures from 0 to 1 whether a track is acoustic or not (0 being the lowest).
Danceability: Measures how suitable the music is for dancing (0 being the least suitable).
Energy: Measures the perception of intensity and activity of a song from 0 to 1 (0 being the lowest).
Loudness: Measures the loudness of tracks in decibels (from -60 to 0).
Speechiness: Measures the ratio of spoken words to music in a track from 0 to 1 (0 being the lowest).
Tempo: Measures the beats per minute (referring to the pace of the track and the average beat duration).
Valence: Measures the positiveness of a track from 0 to 1 (0 being the most negative value and 1 being the most positive value).
Our hypothesis testing is going to be pairwise comparing each feature on pairs of all the artists. Our null hypothesis for every feature will be that there is no difference on that specific feature among all pairs of the artists, while the alternative hypothesis will be that there is a statistically significant differnce between pairs of artists on that feature.
The hypotheses can be written as follows:
\(H_0: \forall f\in F [\forall a,b \in A, a \neq b [\mu_{f,a} - \mu_{f,b} = 0]]\) \(H_A: \exists f\in F [\exists a,b \in A, a \neq b [\mu_{f,a} - \mu_{f,b} \neq 0]]\)
\(F\) = {\(\text{Acousticness, Danceability, Energy, Loudness, Speechiness, Tempo, Valence}\)} \(A\) = {\(\text{Taylor Swift, BTS, Bad Bunny, Drake, The Weeknd}\)}
Data Collection
In order to access and contrast the audio features of the artists who are most streamed on Spotify in 2022, we first created a Spotify Developer account. With this account, we can access Spotify’s API, which is an effective tool for getting different types of data from Spotify. You can access or retrieve information about artists, tracks, and audio features by using get_artist_audio_features( ) function offered by the Spotify API. The get_artist_audio_features() function appears to be especially pertinent for our needs. We can use this function to find out specific artists’ audio feature details in detail. These audio features, which provide insights into the musical qualities that make these artists popular, may include elements like tempo, loudness, energy, danceability, and such.
Artist_name Acousticness Danceability Energy Instrumentalness Liveness
1 Taylor Swift 0.009420 0.757 0.610 3.66e-05 0.3670
2 Taylor Swift 0.088500 0.733 0.733 0.00e+00 0.1680
3 Taylor Swift 0.000421 0.511 0.822 1.97e-02 0.0899
4 Taylor Swift 0.000537 0.545 0.885 5.59e-05 0.3850
5 Taylor Swift 0.000656 0.588 0.721 0.00e+00 0.1310
6 Taylor Swift 0.012100 0.636 0.808 2.18e-05 0.3590
Loudness Speechiness Tempo Valence
1 -4.840 0.0327 116.998 0.685
2 -5.376 0.0670 96.057 0.701
3 -4.785 0.0397 94.868 0.305
4 -5.968 0.0447 92.021 0.206
5 -5.579 0.0317 96.997 0.520
6 -5.693 0.0729 160.058 0.917
Track_name
1 Welcome To New York (Taylor's Version)
2 Blank Space (Taylor's Version)
3 Style (Taylor's Version)
4 Out Of The Woods (Taylor's Version)
5 All You Had To Do Was Stay (Taylor's Version)
6 Shake It Off (Taylor's Version)
Album_name Album_release_year
1 1989 (Taylor's Version) [Deluxe] 2023
2 1989 (Taylor's Version) [Deluxe] 2023
3 1989 (Taylor's Version) [Deluxe] 2023
4 1989 (Taylor's Version) [Deluxe] 2023
5 1989 (Taylor's Version) [Deluxe] 2023
6 1989 (Taylor's Version) [Deluxe] 2023
Checking NAs
cat("There is", sum(is.na(df)),"NA values.")
There is 0 NA values.
Tests
T-Test
The t-test is used to compare means to determine if there is a significant difference. The t-test is based on the t-statistic, which is the difference of the means of two populations or samples and dividing it by the standard error of the difference. We used a pairwise t-test in order to test if the means of the specific features were different between the different pairs of artists. The t-test returns a matrix of p-values calculated from the t-statistics for each artist pair. Each p-value refers to the probability of observing a t-statistic as extreme or more extreme than the one obtained. We used a 95% confidence interval. This means that if the p-value was above 0.05\(\%\), we rejected the null hypothesis and accepted the alternative hypothesis for that artist pairing.
Wilcoxon Rank Sum Test
Wilcoxon test is a non-parametric statistical test used to find whether there is a statistically significant difference between a pair of means. As a non-parametric test, it is great as it does not assume equal variance between the two populations. How it works is by calculating the difference between each pair of data points (one from each population or sample) and ranking the differences in absolute value from largest to smallest. Then, we are calculating the sum of the ranks for each and the Wilcoxon test statistic is the smallest of the two sum ranks (in absolute value). Additionally, a p-value is calculated which determines the probability of obtaining a test statistic as extreme or more extreme than the one obtained, and this is the value returned in our hypothesis tests. For this test, we have used a 95\(\%\) confidence level, which means that if the p-value obtained is less than 0.05, we reject the null hypothesis while if it is greater than 0.05, we fail to reject our null hypothesis.
Bootstrap Test
The bootstrapping difference in means is a useful hypothesis test to compare whether or not two means statistically differ from each other. Each population/sample is resampled n times with n samples in each resample. Then, the mean of each is calculated and subtracted from the other resample of the comparing population/sample. Then, a new distribution is obtained (normally distributed) of the difference in means. Then, a confidence interval is calculated at your confidence level (95\(\%\) in our case) and, if the interval contains the value of 0, we will reject the null as we find statistical significance on the means of both populations. If the interval is positive, that means that the first mean is statistically significantly greater from the one subtracted while, if negative, the first mean is statistically significantly lower from the one subtracted. However, if the confidence interval contains 0, we fail to reject at our confidence level and cannot state that there is a difference between the means of both populations.
EDA
The objective of Exploratory Data Analysis is to give us a better understanding of our data sets by obtaining information about the data’s range, characteristics, correlations, patterns, and visual outliers. This step is crucial for making the right assumptions, cross checking results, and making the right conclusions.
The plots below provide us with relevant information about each of the seven features based on each artist. While no final conclusions can be drawn from these, they will allow us to visualize the data and better understand it.
Code
cat("Taylor Swift has",nrow(ts),"tracks.","\n")
Taylor Swift has 530 tracks.
Code
cat("BTS has",nrow(bts),"tracks.","\n")
BTS has 294 tracks.
Code
cat("Bad Bunny has",nrow(bb),"tracks.","\n")
Bad Bunny has 113 tracks.
Code
cat("Drake has",nrow(dk),"tracks.","\n")
Drake has 308 tracks.
Code
cat("The Weeknd has",nrow(wd),"tracks.","\n")
The Weeknd has 252 tracks.
The above data frame row numbers give us the number of tracks published by each artist on spotify. Taylor swift unfortunately has the largest catalog, with 530 individual tracks. Bad Bunny has the smallest catalog of the top five streamed artists, with only 113 individual tracks. To help put those numbers into perspective, consider that the average EP is 4 to 5 songs long, and the average album is 10-12 songs long. Even though Bad Bunny has the smallest catalog of the top five, he still has far more songs then the average spotify artist. It makes sense that the most streamed artists would be ones that have a significant number of tracks for listeners to play.
Code
mx =cor(df[,c(2,3,4,5,6,7,8,9,10)])ggcorrplot(mx, hc.order =TRUE, type ="lower", lab=TRUE, title ="Correlation between features")
Above is a correlation heatmap between all of the initially collected features. After preliminary EDA, we determined that the features Instrumentallness and Liveness would be excluded from future EDA, because due to the nature of the top five artists they provide very little useful information. In the heatmap above, we can see that the highest correlation is between energy and loudness. This makes sense, as we can expect louder songs to feel like they have more energy. We can also see significant positive correlation between energy and valence, telling us that high energy songs are probably more likely to be positive sounding. There is a significant visible negative correlation between loudness and acousticness, as well as energy and acousticness. This also makes sense, as acoustic songs tend to sound slower and lower energy then non acoustic ones. One of the more interesting findings from this plot is that both tempo and speechiness don’t have a strong correlation with any other features.
Code
ggplot(df, aes_string(x ="Artist_name", y ="Speechiness", fill ="Artist_name")) +geom_boxplot() +ggtitle("Speechiness Comparison") +xlab("Artists") +ylab("Speechiness") +scale_fill_manual(values =c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +theme_minimal()
Code
ggplot(df, aes_string(x ="Artist_name", y ="Loudness", fill ="Artist_name")) +geom_boxplot() +ggtitle("Loudness Comparison") +xlab("Artists") +ylab("Loudness") +scale_fill_manual(values =c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +theme_minimal()
Above are box plots for the features speechiness and loudness. Speechiness describes the amount of spoken word in a Spotify track. We can see that all of the top five artists have relatively low speechiness scores, with Taylor Swift’s average and interquartile range being by far the smallest. Drake has the largest average and interquartile range. This is likely due to the genre of the artists, as Taylor Swift’s songs are almost all pop or country pop, while Drake’s music is closer to rap/hip-hop. In our loudness box plot, we can see all of the artists have relatively high loudness scores. BTS has the highest loudness score, followed by Bad Bunny. Drake, Taylor Swift, and the Weeknd all have relatively similar loudness scores. This is likely also due to the artist’s genre. BTS is a K-POP group, while Bad BUnny is Reggaeton.
Code
ggplot(df, aes_string(x ="Artist_name", y ="Instrumentalness", fill ="Artist_name")) +geom_boxplot() +ggtitle("Instrumentalness Comparison") +xlab("Artists") +ylab("Instrumentalness") +scale_fill_manual(values =c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +theme_minimal()
Above is the box plot for instrumentalness. Instrumentalness describes if a track is instrumental as opposed to vocal. We chose to not include this feature in our analysis because of the fact that almost all of the values are 0. This is expected, because all of the top five artists are singers.
Code
ggplot(df, aes(x = Energy, fill = Artist_name)) +geom_density(alpha =0.5) +labs(title ="Energy Comparison", x ="Energy", y ="Density") +scale_fill_manual(values =c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +theme_minimal()
Code
ggplot(df, aes(x = Tempo, fill = Artist_name)) +geom_density(alpha =0.5) +labs(title ="Tempo Comparison", x ="Tempo", y ="Density") +scale_fill_manual(values =c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +theme_minimal()
Above are density plots for the features energy and tempo. This type of plot allows us to view the distributions of the features for each artist on top of each other. When looking at the energy densities, we can see right away that BTS has a much higher energy distribution than the other artists. They also have a lower standard deviation than the other distributions. BAd bunny has the second highest energy distribution. Drake has the lowest energy distribution. One interesting thing to note is that Taylor Swift appears to have the highest standard deviation among the top five artists.
Code
ggplot(data = df, aes(x = Artist_name, y = Danceability, fill = Artist_name)) +geom_violin(width =0.5) +geom_boxplot(width =0.9, color ="grey", alpha =0.2) +scale_fill_viridis(discrete =TRUE) +labs(title ="Danceability Comparison", x ="Artists", y ="Danceability")
Code
ggplot(data = df, aes(x = Artist_name, y = Acousticness, fill = Artist_name)) +geom_violin(width =0.5) +geom_boxplot(width =0.9, color ="grey", alpha =0.2) +scale_fill_viridis(discrete =TRUE) +labs(title ="Acousticness Comparison", x ="Artists", y ="Acousticness")
Code
ggplot(data = df, aes(x = Artist_name, y = Valence, fill = Artist_name)) +geom_violin(width =0.5) +geom_boxplot(width =0.9, color ="grey", alpha =0.2) +scale_fill_viridis(discrete =TRUE) +labs(title ="Valence Comparison", x ="Artists", y ="Valence")
Above are density plots for the features energy and tempo. This type of plot allows us to view the distributions of the features for each artist on top of each other. When looking at the energy densities, we can see right away that BTS has a much higher energy distribution than the other artists. They also have a lower standard deviation than the other distributions. BAd bunny has the second highest energy distribution. Drake has the lowest energy distribution. One interesting thing to note is that Taylor Swift appears to have the highest standard deviation among the top five artists.
Above is a density box plot for acousticness. Again, we can see that BTS has a much lower Acousticness score and a significantly narrower interquartile range. We can see again that Taylor Swift has a much wider inter quartile range than any of the other top five artists. Another interesting thing we can take away from this plot is the visual representation of the scale of the artists catalogs. The density of Taylor Swift’s plot is much greater than that of artists with smaller catalogs like BTS and Bad Bunny.
Code
plot_ly(df, x =~Energy, y =~Loudness, z =~Artist_name, type ='scatter3d', color =~Artist_name, mode ='markers')
Above is the 3D plot on loudness and energy between artists. As we can see on the correlation heatmap, we can see there is a positive correlation between energy and loudness on all artists.
Code
plot_ly(df, x =~Loudness, y =~Acousticness, z =~Artist_name, type ='scatter3d', color =~Artist_name, mode ='markers')
Having the correlation coefficient of -0.56, the 3D plot doesn’t show the a strong correlation visually.
Hypothesis Testing
Functions for bootstrap tests
# Function to bootstrap mean difference of two samples for the testboot.test =function(x1,x2,feature,iter){ boot1 =numeric(length(x1[[feature]])) boot2 =numeric(length(x2[[feature]]))for(i in1:iter){ boot1[i] =mean(sample(x1[[feature]],length(x1[[feature]]),replace=TRUE)) boot2[i] =mean(sample(x2[[feature]],length(x2[[feature]]),replace=TRUE)) }return(boot1 - boot2)}# Function to check if 0 is within the rangecontains_zero <-function(quantiles) {return (quantiles[1] <=0&& quantiles[2] >=0)}
These two functions will make easier to do bootstrap tests. Function boot.test returns mean difference between two bootstrap samples. From that mean difference, we can use contains_zero function to check whether the confidence interval includes zero or not in a certain level.
Acousticness
T-Test
pairwise.t.test(df$Acousticness, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Acousticness and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 0.00287 - - -
Drake 1.00000 7.3e-06 - -
Taylor Swift 0.01984 < 2e-16 0.00053 -
The Weeknd 1.00000 2.3e-07 1.00000 0.02917
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of acousticness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Taylor Swift
Drake and Taylor Swift
Taylor Swift and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of acousticness.
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Acousticness and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 7.2e-11 - - -
Drake 0.67 1.8e-10 - -
Taylor Swift 1.00 6.4e-15 0.29 -
The Weeknd 1.00 1.7e-09 1.00 0.67
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of acousticness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of acousticness.
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of acousticness between the following pairs of artists:
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and BTS
Taylor Swift and The Weeknd
BTS and Bad Bunny
BTS and Drake
BTS and The Weeknd
Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of acousticness.
Bad Bunny and Drake
Bad Bunny and The Weeknd
Drake and The Weeknd
Danceability
T-Test
pairwise.t.test(df$Danceability, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Danceability and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS < 2e-16 - - -
Drake 3.3e-11 0.0041 - -
Taylor Swift < 2e-16 0.0082 2.8e-09 -
The Weeknd < 2e-16 1.2e-12 < 2e-16 2.4e-08
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of danceability between the following pairs of artists:
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Danceability and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS < 2e-16 - - -
Drake 3.4e-09 0.0089 - -
Taylor Swift < 2e-16 0.0089 1.8e-07 -
The Weeknd < 2e-16 2.1e-06 3.8e-12 0.0024
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of danceability between the following pairs of artists:
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of danceability between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and The Weeknd
Drake and Bad Bunny
Drake and The Weeknd
Bad Bunny and the Weeknd
Thus, all pairs hold significant difference.
Energy
T-Test
pairwise.t.test(df$Energy, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Energy and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 9.9e-08 - - -
Drake 8.3e-09 < 2e-16 - -
Taylor Swift 3.4e-06 < 2e-16 0.05909 -
The Weeknd 0.00014 < 2e-16 0.05190 0.54488
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of energy between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and Taylor Swift
Bad Bunny and The Weekend
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of energy.
Drake and Taylor Swift
Drake and The Weeknd
Taylor Swift and The Weeknd
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Energy, df$Artist_name)
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Energy and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 1.3e-14 - - -
Drake 7.7e-11 < 2e-16 - -
Taylor Swift 3.3e-05 < 2e-16 0.04077 -
The Weeknd 0.00013 < 2e-16 0.00448 0.49284
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of energy between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and Taylor Swift
Bad Bunny and The Weeknd
Drake and Taylor Swift
Drake and The Weekend
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of energy.
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of energy between the following pairs of artists:
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and BTS
BTS and Bad Bunny
BTS and Drake
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and The Weeknd
Drake and The Weeknd
Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of energy.
Taylor Swift and The Weeknd
Loudness
T-Test
pairwise.t.test(df$Loudness, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Loudness and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 0.0097 - - -
Drake 2.0e-11 < 2e-16 - -
Taylor Swift 1.3e-06 < 2e-16 0.0030 -
The Weeknd 1.9e-14 < 2e-16 0.1199 6.6e-06
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of loudness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and Taylor Swift
Bad Bunny and The Weekend
Drake and Taylor Swift
Taylor Swift and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of loudness.
Drake and The Weeknd
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Loudness, df$Artist_name)
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Loudness and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 1.7e-09 - - -
Drake < 2e-16 < 2e-16 - -
Taylor Swift 1.3e-07 < 2e-16 0.00048 -
The Weeknd < 2e-16 < 2e-16 0.63434 0.00088
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of loudness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and Taylor Swift
Bad Bunny and The Weekend
Drake and Taylor Swift
Taylor Swift and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of loudness.
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of loudness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and Taylor Swift
Bad Bunny and The Weekend
Drake and Taylor Swift
Taylor Swift and The Weeknd
Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of loudness.
Drake and The Weeknd
Speechiness
T-Test
pairwise.t.test(df$Speechiness, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Speechiness and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 0.036 - - -
Drake 2.3e-10 3.7e-08 - -
Taylor Swift 8.3e-10 < 2e-16 < 2e-16 -
The Weeknd 1.1e-05 < 2e-16 < 2e-16 0.067
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of speechiness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and Taylor Swift
Bad Bunny and The Weeknd
Drake and Taylor Swift
Drake and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of speechiness.
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Speechiness and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 0.18 - - -
Drake 2.2e-07 5.2e-08 - -
Taylor Swift < 2e-16 < 2e-16 < 2e-16 -
The Weeknd 9.5e-11 < 2e-16 < 2e-16 4.4e-15
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of speechiness between the following pairs of artists:
– BTS and Drake - BTS and Taylor Swift - BTS and The Weeknd - Bad Bunny and Drake - Bad Bunny and Taylor Swift - Bad Bunny and The Weeknd - Drake and Taylor Swift - Drake and The Weeknd - Taylor Swift and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of speechiness.
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of speechiness between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and The Weeknd
Drake and Bad Bunny
Drake and The Weeknd
Bad Bunny and the Weeknd
Thus, all pairs hold significant difference.
Tempo
T-Test
pairwise.t.test(df$Tempo, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Tempo and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 1 - - -
Drake 1 1 - -
Taylor Swift 1 1 1 -
The Weeknd 1 1 1 1
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is not a significant difference in the means of tempo between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and The Weeknd
Drake and Bad Bunny
Drake and The Weeknd
Bad Bunny and the Weeknd
Thus, we fail to reject that any pair of artists hold a significant difference.
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Tempo, df$Artist_name)
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Tempo and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 1 - - -
Drake 1 1 - -
Taylor Swift 1 1 1 -
The Weeknd 1 1 1 1
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is not a significant difference in the means of tempo between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and The Weeknd
Drake and Bad Bunny
Drake and The Weeknd
Bad Bunny and the Weeknd
Thus, we fail to reject that any pair of artists hold a significant difference.
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is not a significant difference in the means of tempo between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and The Weeknd
Drake and Bad Bunny
Drake and The Weeknd
Bad Bunny and the Weeknd
Thus, we fail to reject that any pair of artists hold a significant difference.
Valence
T-Test
pairwise.t.test(df$Valence, df$Artist_name)
Pairwise comparisons using t tests with pooled SD
data: df$Valence and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 0.03659 - - -
Drake 4.9e-09 < 2e-16 - -
Taylor Swift 0.00011 < 2e-16 0.00193 -
The Weeknd 2.3e-12 < 2e-16 0.07401 1.2e-06
P value adjustment method: holm
From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of valence between the following pairs of artists:
BTS and Bad Bunny
BTS and Drake
BTS and Taylor Swift
BTS and The Weeknd
Bad Bunny and Taylor Swift
Bad Bunny and Drake
Bad Bunny and The Weeknd
Drake and Taylor Swift
Taylor Swift and The Weeknd
Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of valence.
Drake and The Weekend
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Valence, df$Artist_name)
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: df$Valence and df$Artist_name
Bad Bunny BTS Drake Taylor Swift
BTS 0.03206 - - -
Drake 1.2e-07 < 2e-16 - -
Taylor Swift 0.00081 < 2e-16 0.00081 -
The Weeknd 5.8e-10 < 2e-16 0.03206 1.4e-07
P value adjustment method: holm
From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of valence between the following pairs of artists:
From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of valence between the following pairs of artists:
Taylor Swift and Bad Bunny
Taylor Swift and Drake
Taylor Swift and BTS
Taylor Swift and The Weeknd
BTS and Bad Bunny
BTS and Drake
BTS and The Weeknd
Bad Bunny and Drake
Bad Bunny and The Weeknd
Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of valence.
Drake and The Weeknd
Conclusion
Each of these hypothesis tests compare every pair of artists in each feature and state whether or not they hold a significant difference at a 95% confidence level. Since some of the results for the same feature differ from each other, we stated that more likely than not the result of the majority vote of all the hypotheses tests is the right one (eg: If two tests say there is a statistical difference between two artists and one does not, then we will chose that there seems to be a significant difference among those artists). While this statement is not the best conclusion, further statistical research would have to be done to determine whether or not these claims are fully correct. However, we believe there is generally a higher likelihood when two tests claim the same result.
Tempo seems to be the only feature where we fail to reject all the null hypotheses. Thus, we cannot state that there is a significant difference among any pair of artists. This is a very interesting finding since it shows that the five most streamed artists in Spotify from all around the world have a similar tempo. It is a safe assumption that the majority of users like songs with a tempo that falls within the range of these artists. This could be an interesting factor for Spotify to take into account while implementing their song recommendation algorithm. Furthermore, upcoming artists may want to take this information into consideration if they are looking for outreach strategies or popularity gain (even though we cannot state with confidence that this feature is deterministic in an artist’s popularity. We would have to do further statistical research to make this claim. However, it would be a “safe bet”).
All the other features: Acousticness, Danceability, Energy, Loudness, and Valence are different for a great number of pairs of artists. Thus, we can state confidently that most artists are statistically different in these features. However, based on our results. There seems to be more commonality of the values between all possible pairs from Drake, Bad Bunny, and The Weeknd than anybody else. This is expected since Drake and The Weeknd sing a similar genre but an interesting finding about Bad Bunny, since instead of Rap/pop he sings mainly reggaeton.
While this information can be useful for many upcoming artists and maybe for Spotify developers, we suggest that upcoming artists don’t try to plagiarize any of these artists and try to be them, but rather obtain inspiration from them and bring their own self to the table. While statistics are helpful to make inferences from any field, music is still an art and everyone should develop and display their own personality through it.